The general aim of the research reported in part in this paper is the development of a prototype computerized abstractor's assistant. As a kind of writer's assistant, such a software package should encompass a simple word processor and other general writer's tools (Kozma, 1991). In addition, the package should integrate tools, such as an automatic extractor, related specifically to the task of abstracting.
Abstracting assistance features are being prototyped in a text network management system, known as TEXNET (Craven, 1988) (Craven, 1991b). Among other options in TEXNET, the abstractor can choose to be presented with full-text words that he or she is likely to want to extract verbatim. These are determined automatically on the basis of frequency, with stop-words being omitted. The abstractor can highlight words desired from this list, and they will automatically be inserted into the abstract in the order in which they were selected.
At the initial stage of development, attention was concentrated on keywords, rather than longer phrases, for two reasons. First, an earlier study (Craven, 1991a) had showed little use in abstracts of longer verbatim word sequences from full texts. Second, keyword extraction is a somewhat simpler task than phrase extraction, though methods for efficient phrase extraction do exist, as in INDEX (Jones et al., 1990), FASIT (Burgin & Dillon, 1992), CLARIT (Paijmans, 1993), and work reported by Fagan (1989).
Recently (Craven, in press) a phrase display option has also been added to TEXNET. Automatically selected phrases are displayed in a compact format that takes account of overlaps among them.
Simplification of the package was undertaken in order to restrict subjects' choices for purposes of experimental control. The chief modification involved restricting the various kinds of displays presented to the full text, the abstract being written, and one other kind of display determined by the experimenter. Half of the subjects saw a display of keywords that occurred at least 8 times in the text; the other half saw a phrase display based on the same frequency threshold as the keyword display. Subjects were told only that these displays were computer-generated "suggestions".
Another simplification was to freeze the dimensions of the windows in which the three displays appeared; this was done mainly to avoid confusing novice subjects. Eliminated features also included spell-checking and various text-structuring capabilities. In addition, the package was modified so that the session would be terminated when the time limit was reached.
The resulting abstracts, and the times taken, were recorded automatically. Some additional information was gathered from the subjects by oral questionnaire: a short list of closed questions on subjects' backgrounds and reactions was followed by a single open-ended request for comments on the software, abstracting, and the experiment.
A letter of information provided to subjects defined an abstract as "a brief, objective, and accurate representation of the contents of a document"; this definition was derived from the ANSI standard for writing abstracts (ANSI, 1979). The letter also informed subjects that they were limited to no more than one hour to write the abstract and that the abstract should be no more than 250 words in length.
Before writing the abstract, each subject was given a brief demonstration of the software, entailing an oral review of an instruction sheet. The same instruction sheet was also available to the subject throughout the writing of the abstract.
A number of criteria were applied in choosing full texts for the experiments: (1) they must be readily available, with author abstracts, in ASCII code; (2) automatically reformatting them to fit into 40-character lines should not cause readability problems; (3) they should be scholarly but not excessively technical; (4) length should be approximately 2000 words.
The first selected text, from the area of education, showed the conventional parts of a scientific report: purpose, methodology, results, and conclusions. The second text, dealing with computer-mediated communication, fitted the pattern of a survey article.
For the experiments, each selected text was stripped of its title, authorship, abstract, references, and other peripheral elements, which were stored separately for future analysis.
The first text was used with 20 subjects, and the second text with 15.
SPSS for Windows (SPSS Inc., 1993) was used for statistical analysis. Questionnaire responses that fitted precoded categories were treated as ordinal values; novel responses were treated as missing values. Because of the number of ordinal variables and the irregular distributions of the interval variables, the Spearman correlation coefficient was generally employed to test relationships.
Overall, phrases did not perform significantly better than keywords in terms of perceived usefulness. When asked how good they thought their abstracts were, subjects again showed little difference between phrase and keyword displays.
The abstracts were analyzed to determine whether subjects presented with phrases tended to employ these phrases in their abstracts more than did the other subjects. The measure used was the number of occurrences of two consecutive words in the abstract that matched two consecutive words in the phrase display. Results suggested some of the expected effect: 7 of phrase group's, but none of the keywords group's, abstracts showed more than 8 matches. Nevertheless, the difference was not statistically significant, and the abstracts of most subjects in the two groups showed an undifferentiated scattering. The author abstracts themselves showed 3 and 5 matches respectively.
Preference for phrasing from the full text was measured as the proportion, out of all pairs of consecutive words in the abstract, that were found also in the full text. Values ranged from 27.4% to 94.1%. The full-text-word-pair densities for the document authors' abstracts, at 42.1% and 42.4% respectively, were slightly toward the low end. A correlation was found here with previous Microsoft Windows experience, though not a statistically significant one. Correlation with previous computerized editing experience was weak. Density of full text phrasing showed little relation to abstracting experience.
A measure of the percent of abstract words found in the full text ranged from 45% to 99%, with the document authors' own abstracts at 78% and 77% respectively. For the first document, this measure showed a significant relation to Microsoft Windows experience, and a marginally significant relation to computerized editing experience in general. The overall correlation for both documents was positive, but not statistically significant.
Similarity to phrasing in the author's own abstract was also measured by word pair density. Values ranged from 1.3% to 18.5%. Little relation was found with previous abstracting experience. Experience with Microsoft Windows showed significant correlation with echoing author-abstract phrasing for the first document, but not in general.
Percent of subject-abstract words found in the document author's abstract ranged from 12% to 43%. This again did not correlate with abstracting experience. It also did not correlate with Microsoft Windows experience.
Responses to the question "How easy was it for you to use the software?" correlated, but not significantly, with previous Microsoft Windows experience and did not correlate noticeably with previous computerized editing.
Responses to the question "How good do you think your abstract is?" showed a limited range: only one subject answered "not at all good", and none "very good". Coded responses correlated with experience with abstracting, but not significantly.
The negative correlation of experience with abstracting with length of abstract, measured in characters, was very weak. Nor did abstracting experience correlate with vocabulary diversity, as measured inversely by Simpson's l (Simpson, 1949).
Of 35 subject abstracts, 13 showed no spelling errors. As was to be expected, spelling error density correlated negatively with experience with computerized editing. It also correlated negatively with use of source-text phrases. There was a nonsignificant negative correlation with seconds remaining at completion.
Five subjects would have liked having the original on paper. Seven subjects commented favorably on the simultaneous display, but thirteen criticized the amount or shape of space given to the three windows: several mentioned wanting more space for either the full text or the abstract.
Ten subjects commented on problems with scrolling.
Four subjects targeted the fact that selected text remained selected even after being copied to the clipboard.
Free-form comments by subjects related to the phrase and keyword displays generally concurred with responses to the closed question on this topic.
Ten subjects noted problems with estimating length of texts. Six of these thought that a word count feature would be helpful. Four subjects commented on the lack of a spell checker.
Five of the 15 subjects who abstracted the survey article commented on difficulties in understanding it.
Four subjects made comments suggesting that they found the time of one hour too short.
Other hypotheses were favored by the data; but correlations were weaker, and statistical significance was not attained: when presented with phrases, subjects tended to incorporate them in their abstracts; subjects with more Windows experience tended to find the software easier; and more experienced abstractors tended to think that their abstracts were better.
Analysis showed several hypothesized relations to be almost nonexistent; namely, that between perceived usefulness of suggestions and inclusion of phrases rather than keywords; and those between abstracting experience on the one hand and approximation to the author's own abstract, writing shorter abstracts, and using more diverse vocabulary on the other.
As soon as subjects' difficulties with scrolling became apparent, the full package was reprogrammed to provide smoother-scrolling displays. The version used for the experiment was left unchanged to provide uniformity.
The lack of immediate feedback from the copy command, including the retention of text selection, certainly caused problems, as also reported by Nielsen (Nielsen, 1994).
Unlike the test software, the full package provides a single command to copy a selected text and paste it immediately into the abstract. This function might have been favorably received by some of the subjects. It does require, however, that the abstract insertion point be set correctly first.
An optional dynamic word-count display has since been introduced into the full package, for both abstract and full text. Some of subject's difficulties with estimating word counts were perhaps due to unfamiliarity with scroll bars.
A spell checker was available in the full version. It was not incorporated in the test version for several reasons, including its incompleteness. The full version was subsequently modified to make use of a spell-checker, such as WinSpell (R&TH, 1994), that runs as a separate application but can monitor keystrokes.
It appears that the second document was somewhat more difficult than the first. Apart from intrinsic reading ease, survey articles are likely to be harder to abstract by the copy and paste method.
Most subjects in fact finished before the hour was up, and the quickest completed in less than 40 minutes. The time limit thus appears not to have been excessively restrictive, though also not excessively generous. We may compare the times for Van Dijk and Kintsch's subjects to produce a 60-to-80-word abstract of a 1600-word story on a computer console (Van Dijk & Kintsch, 1985).
More detailed analysis of the abstracts produced is possible. For example, Salager-Meyer (Salager-Meyer, 1991) studied basic content elements, order, and paragraphing in English-language medical abstracts. Kaplan and others (Kaplan et al., 1994) applied linguistic analysis to abstracts submitted for conference presentations and also divided them into accepted and non-accepted categories. Current plans call at least for having a sample of subjects' abstracts graded by an independent reviewer.
In studies involving think-aloud protocols (Endres-Niggemeyer, Waumans, & Yamashita, 1991), it has been noted that individuals use quite different approaches in writing abstracts. Differences in approach might correlate with properties of abstracts produced or with reactions to particular computerized tools.
Additional evaluative research should be undertaken by others: this should include research into the tools' efficiency and effectiveness in assisting in the real-life production of abstracts, as well as replication of the experimental situation with a variety of abstracts and abstractors. It is intended to make the software freely available to students and other interested individuals, for purposes of feedback and both formal and informal testing.
Burgin, R., & Dillon, M. (1992). Improving disambiguation in FASIT. Journal of the American Society for Information Science, 43 (2), 101-114.
Craven, T.C. (1988). Text network display editing with special reference to the production of customized abstracts. Canadian journal of information science, 13 (1/2), 59-68.
Craven, T.C. (1991a). Use of words and phrases from full text in abstracts. Journal of information science, 16, 351-358.
Craven, T.C. (1991b). Algorithms for graphic display of sentence dependency structures. Information processing and management, 27 (6), 603-613.
Craven, T.C. (1993). A computer-aided abstracting tool kit. Canadian journal of information science, 18 (2), 19-31.
Craven, T.C. (In press). Presentation of repeated phrases in a computer-assisted abstracting tool kit. Information processing and management
Endres-Niggemeyer, B. (1994). Summarizing text for intelligent communication: results of the Dagstuhl Seminar. Knowledge organization, 21 (4), 213-223.
Endres-Niggemeyer, B, Waumans, W, & Yamashita, H (1991). Modelling summary writing by introspection: a small-scale demonstrative study. Text, 11 (4), 523-552.
Fagan, J.L. (1989). The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval. Journal of the American Society for Information Science, 40 (2), 115-132.
Jones, L.P., Gassie, E.W., & Radhakrishnan, S. (1990). INDEX: the statistical basis for an automatic conceptual phrase-indexing system. Journal of the American Society for Information Science, 41 (2), 87-97.
Kaplan, R.B., Cantor, S., Hagstrom, C., Kamhi-Stein, L.D., Shiotani, Y., & Zimmerman, C.B. (1994). On abstract writing. Text, 14 (3), 401-426.
Kozma, R.B. (1991). The impact of computer-based tools and embedded prompts on writing processes and products of novice and advanced college writers. Cognition and instruction, 8 (1), 1-27.
Nielsen, J. (1994). Estimating the number of subjects needed for a thinking aloud test. International journal of human-computer studies, 41, 385-397.
Paice, C. (1990). Constructing literature abstracts by computer: techniques and prospects. Information processing and management, 26 (1), 171-186.
Paijmans, H. (1993). Comparing the document representations of two IR systems: CLARIT and TOPIC. Journal of the American Society for Information Science, 44 (7), 383-392.
Salager-Meyer, F. (1991). Medical English abstracts: how well are they structured?. Journal of the American Society for Information Science, 42 (7), 528-531.
Simpson, E.H. (1949). Measurement of diversity. Nature, 163, 688.
R & TH Inc (1994). WinSpell version 3.08: the Windows spelling supervisor. Richardson, Texas: R & TH Inc.
SPSS Inc. (1993). SPSS for Windows, Release 6.0 (Jun 17 1993).
Tibbo, H.R. (1992). Abstracting across the disciplines: a content analysis of abstracts from the natural sciences, the social sciences, and the humanities with implications for abstracting standards and online information retrieval. Library and information science research, 14 (1), 31-56.
Van Dijk, T., & Kintsch, W. (1985). Cognitive psychology and discourse: recalling and summarizing stories. In H. Singer, &R.B. Ruddell (Ed.), Theoretical models and processes of reading, third edition (pp. 794-812). Newark, Delaware: International Reading Association.
AUABDYAD | Proportion of 2-word sequences in abstract found in author abstract |
AUABWORD | Proportion of words in abstract found in author abstract |
BYTES | Length of abstract in bytes |
COMPEDIT | "How familiar are you with using a computer for editing text?" |
EXABOTHE | "How much experience have you had in writing abstracts of documents written by other people?" |
FULLDYAD | Proportion of 2-word sequences in abstract found in full text |
FULLWORD | Proportion of words in abstract found in full text |
GOODABST | "How good do you think your abstract is?" |
MISSPDEN | Proportion of misspelled words |
PHRADYAD | Number of 2-word sequences in abstract found in phrase display |
SEC.REMA | Seconds remaining at completion |
SIMPSONL | Simpson's l of abstract words |
SOFTEASY | "How easy was it for you to use the software?" |
SUGGESTI | Type of information in Suggestions window |
SUGGUSEF | "How useful did you find the information in the Suggestions window?" |
WINDOWS | "How familiar are you with the MicroSoft Windows environment?" |
Variables correlated | Spearman correlation | Significance |
---|---|---|
SUGGESTI - SUGGUSEF | 0.1076 | 0.551 |
SUGGESTI - GOODABST | 0.1238 | 0.507 |
SUGGESTI - PHRADYAD | 0.3005 | 0.079 |
FULLDYAD - WINDOWS | 0.2632 | 0.139 |
FULLDYAD - COMPEDIT | 0.1243 | 0.491 |
FULLDYAD - EXABOTHE | 0.1163 | 0.506 |
FULLWORD-COMPEDIT (document 1) | 0.2903 (0.4868) | 0.101 (0.040) |
FULLWORD - WINDOWS | 0.5987 | 0.007 |
AUABDYAD - EXABOTHE | 0.0650 | 0.711 |
AUABDYAD - WINDOWS (document 1) | 0.2292 (0.5725) | 0.200 (0.010) |
AUABWORD - EXABOTHE | 0.0681 | 0.697 |
AUABWORD - WINDOWS | -0.1394 | 0.439 |
SOFTEASY - WINDOWS | 0.2764 | 0.126 |
SOFTEASY - COMPEDIT | 0.0577 | 0.758 |
EXABOTHE - GOODABST | 0.2251 | 0.223 |
EXABOTHE - BYTES | -0.1070 | 0.540 |
EXABOTHE - SIMPSONL | -0.0265 | 0.880 |
MISSPDEN - COMPEDIT | -0.4149 | 0.016 |
MISSPDEN - FULLDYAD | -0.4886 | 0.003 |
MISSPDEN - SEC.REMA | -0.1283 | 0.463 |
Phrases subjects | Keywords subjects | |
---|---|---|
"not at all useful" | 1 | 5 |
"not very useful" | 9 | 5 |
"quite useful" | 4 | 5 |
"very useful" | 2 | 2 |
other | 2 | 0 |
© 1996, American Society for Information Science. Permission to copy and distribute this document is hereby granted provided that this copyright notice is retained on all copies and that copies are not altered.